pacman::p_load(sf, tidyverse, questionr, janitor, psych, ggplot2, gcookbook, tmap, ggpubr, corrplot, gtsummary, regclass, caret, heatmaply, ggdendro, cluster, factoextra, spdep, ClustGeo, GGally)Regionalisation with Spatially Constrained Cluster Analysis
case study : Regionalisation by Multivariate Water Point Attributes with Non-spatially Constrained and Spatially Constrained Clustering Methods.
1. OVERVIEW
Regionalisation with Spatially Constrained clustering analysis requires similar observations to be grouped according to their statistical attributes and spatial location.
This study focuses on regionalising analysis based on Nigeria’s water points attributes.
1.1 Objectives
Regionalise Nigeria by using the following measures :
Total number of water points by status, i.e. functional, non-functional, and unknown;
Percentage of water points by :
status (functional, non-functional, and unknown);
deployed water technology (hand pump, mechanical pump, stand tap, etc.) ;
usage capacity (1000, 300, 250, 50);
rural or urban.
1.2 Scope of Works
Some of the specific tasks for this study are :
- import the shapefile into R with the appropriate sf method, and save it in a simple feature data frame format;
Three (3) Projected Coordinate Systems of Nigeria, EPSG : 26391, 26392, and 26303.
derive the proportion of functional and non-functional water points at LGA level (i.e. ADM2) by appropriate tidyr and dplyr methods;
combine geospatial and aspatial data frames into a simple feature data frame.
delineate water points measures functional regions by using :
conventional hierarchical clustering.
spatially constrained clustering algorithms.
plot two (2) main types of maps below :
Thematic Mapping
Show the derived water-point measures by appropriate statistical graphics and choropleth mapping technique.
Analytical Mapping
Plot delineated functional regions using non-spatially constrained and spatially constrained clustering algorithms.
2. R PACKAGE REQUIRED
Following are the packages require for this exercise :
importing and Processing Geospatial Data
- sf
st_as_sfc( ) - 3.3.1
st_sf( ) - 3.3.2
st_read( ) - 3.4
st_join( ) - 3.5
- sf
importing and processing non-spatial data
tidyverse
readr
-- read_csv( ) - 3.2.1, 3.2.2.1
-- problems( ) - 3.2.1
-- write_rds( ) - 3.2.1.1, 3.2.2.3
-- read_rds( )- 3.2.1.1
dplyr
-- filter( ) - 3.2.2.1
-- left_join( ) -
-- add_count( ) - 3.3.1.3
-- select( ) - 3.4
-- count( ) -
tidyr
-- replace_na( ) - 3.
plot map for visualisation
tmap
- tmap_mode( ) -
- tm_shape( ) -
- tm_polygons( ) -
- tm_view( ) -
- tm_fill( ) -
- tm_borders( ) -
- tm_style( ) -
- tm_layout( ) -
- qtm( ) -
questionr
- freq( ) - 3.6.3
factoextra
stats
dist( ) - 5.1.2
base
- summary( ) - 3.2.1.2
- duplicated( ) - 3.4.3.3
2.1 Load R Packages into R Environment
Use the code chunk below.
3. GEOSPATIAL DATA
3.1 Acquire Data Source
Aspatial Data
- Download the Nigeria data set in shapefile format via Access WPdx+ Global Data Repository from WPdx Global Data Repositories.
- Rename the title of the data set to “geo_export”.
Geospatial Data
- Download the Nigeria geoBoundaries data set at ADM2 level from geoBoundaries.org or the Humanitarian Data Exchange portal.
- Rename the title of the data set to “nga_admbnda_adm2_osgof_20190417”
3.2 Import Aspatial Data
3.2.4 Create Master File
Usage of the code chunk below :
left_join( ) - dplyr - to combine wp_coord, wp_cond and wp_adm.
wp <- left_join((
left_join(
wp_coord,
wp_cond,
by = c("row_id")
)),
wp_adm, by = c("row_id"))3.2.5 Convert Well Known Text (WKT) Data to SF Data Frame
The “New Georeferenced Column” in wp_rds contains spatial data in a WKT format.
Two (2) steps to convert the WKT data format into an sf data frame.
3.2.5.1 derive new field :: “geometry”
Usage of the code chunk below :
st_as_sfc( ) - sf - to derive a new field “geometry”.
wp$geometry = st_as_sfc(wp$`New Georeferenced Column`)3.2.5.2 convert to SF Data Frame
Usage of the code chunk below :
st_sf( ) - sf - to convert the tibble data frame into sf data frame with crs first set to WGS 84 (EPSG : 4326).
st_crs( ) - sf - to retrieve coordinate reference system from the object.
wp_sf<- st_sf(wp, crs = 4326)
st_crs(wp_sf)Coordinate Reference System:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
3.2.5.3 retrieve geometry summary :: wp_sf
Usage of the code chunk below :
st_geometry( ) - sf - to get the geometry summary.
st_geometry(wp_sf)Geometry set for 95008 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 2.707441 ymin: 4.301812 xmax: 14.21828 ymax: 13.86568
Geodetic CRS: WGS 84
First 5 geometries:
POINT (6.95009 6.78599)
POINT (7.604793 6.78321)
POINT (7.60024 6.759284)
POINT (7.615451 6.799595)
POINT (7.65991 6.762375)
3.3 Import Boundary Data of Nigeria LGA
Usage of the code chunk below :
st_read( ) - sf - to read simple features.
select( ) - dplyr - to select “shapeName” variable.
bdy_nga <- st_read(dsn = "/jephOstan/ISSS624/class_project/project_2/data/geospatial",
layer = "geoBoundaries-NGA-ADM2",
crs = 4326) %>%
select(shapeName)Reading layer `geoBoundaries-NGA-ADM2' from data source
`D:\jephOstan\ISSS624\class_project\project_2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
problems(bdy_nga)3.3.1 Review Imported File
3.3.1.1 check for missing data
freq.na(bdy_nga$shapeName)missing %
0 0
3.3.1.2 check for duplication :: “shapeName”
Usage of the code chunk below :
duplicated( ) - base - to determine duplicate elements.
freq(duplicated(bdy_nga$shapeName)) n % val%
FALSE 768 99.2 99.2
TRUE 6 0.8 0.8
3.3.1.3 list the duplicated value :: “shapeName”
Usage of the code chunk below :
add_count( ) - dplyr - to count observations by group
filter( ) - dplyr - to retain shapeName that has count not equal to 1.
wp_duplShapeName <- bdy_nga %>%
add_count(bdy_nga$shapeName) %>%
filter(n!=1) %>%
select(-n)
wp_duplShapeNameSimple feature collection with 12 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 3.316459 ymin: 6.459038 xmax: 9.020704 ymax: 12.05035
Geodetic CRS: WGS 84
First 10 features:
shapeName bdy_nga$shapeName geometry
1 Bassa Bassa MULTIPOLYGON (((6.708541 7....
2 Bassa Bassa MULTIPOLYGON (((8.823522 10...
3 Ifelodun Ifelodun MULTIPOLYGON (((4.664107 8....
4 Ifelodun Ifelodun MULTIPOLYGON (((4.721977 7....
5 Irepodun Irepodun MULTIPOLYGON (((5.05493 8.0...
6 Irepodun Irepodun MULTIPOLYGON (((4.543349 7....
7 Nasarawa Nasarawa MULTIPOLYGON (((8.554589 11...
8 Nasarawa Nasarawa MULTIPOLYGON (((7.493228 8....
9 Obi Obi MULTIPOLYGON (((8.191919 6....
10 Obi Obi MULTIPOLYGON (((9.008576 8....
3.3.1.4 verify findings in section 3.3.1.3
Usage of the code chunk below :
tmap_mode( ) - tmap - to set tmap mode to static plotting or interactive.
tm_shape( ) - tmap - to specify the shape object.
tm_polygons( ) - tmap - to fill the polygons and draw the polygon borders.
tm_view( ) - tmap - to set the options for the interactive tmap viewer.
tm_fill( ) - tmap - to specify either which colour to be used or which data variable mapped to the colour palette.
tm_borders( ) - tmap - to draw the polygon borders.
tmap_style( ) - tmap - to set the tmap style.
tm_layout( ) - tmap - to set the layout of cartographic map.
tmap_mode("view")tmap mode set to interactive viewing
tm_shape(bdy_nga)+
tm_polygons()+
tm_view(set.zoom.limits = c(6,8))+
tm_shape(wp_duplShapeName)+
tm_fill("shapeName",
n = 6,
style = "jenks")+
tm_borders(alpha = 0.5)+
tmap_style("albatross")+
tm_layout(main.title = "Distribution of Duplicated ShapeName",
main.title.size = 1.3,
main.title.position = "center")tmap style set to "albatross"
other available styles are: "white", "gray", "natural", "cobalt", "col_blind", "beaver", "bw", "classic", "watercolor"
Remarks :
The plot above indicates those duplicated water points are from different Nigeria states.
tmap_mode("plot")tmap mode set to plotting
3.3.1.5 acquire State info for duplicated value
The State info to be combined with the duplicated “shapeName”. This will make all the shapeName unique.
| lga | row_id | headquarter | state | iso3166code | state_dd_coordinates |
|---|---|---|---|---|---|
| Bassa | 94 | Oguma | Kogi | NG.KO.BA | 7.75 6.75 |
| Bassa | 95 | Bassa | Plateau | NG.PL.BA | 9.16667 9.75 |
| Ifelodun | 304 | Share | Kwara | NG.KW.IF | 8.5 5.0 |
| Ifelodun | 305 | Ikirun | Osun | NG.OS.ID | 7.5 4.5 |
| Irepodun | 355 | Omu Aran | Kwara | NG.KW.IR | 8.5 5.0 |
| Irepodun | 356 | Ilobu | Osun | NG.OS.IP | 7.5 4.5 |
| Nasarawa | 519 | Bompai | Kano | NG.KN.NA | 11.5 8.5 |
| Nasarawa | 520 | Nasarawa | Nasarawa | NG.NA.NA | 8.53 7.7 |
| Obi | 546 | Obi | Nasarawa | NG.NA.OB | 8.53 7.7 |
| Obi | 547 | Obarike-Ito | Benue | NG.BE.OB | 7.33333 8.75 |
| Surelere | 693 | Surelere | Lagos | NG.LA.SU | 6.5 3.35 |
| Surelere | 694 | Iresa-Adu | Oyo | NG.OY.SU | 8.07 4.41 |
3.4 Data Wrangling
3.4.1 Edit Duplicated Value :: “shapeName”
bdy_nga$shapeName[c(94,95,304,305,355,356,519,546,547,693,694)] <-
c("Bassa Kogi",
"Bassa Plateau",
"Ifelodun Kwara",
"Ifelodun Osun",
"Irepodun Kwara",
"Irepodun Osun",
"Nasarawa Kano",
"Nasarawa Nasarawa",
"Obi Nasarawa",
"Obi Benue",
"Surulere Lagos",
"Surulere Oyo")Warning in bdy_nga$shapeName[c(94, 95, 304, 305, 355, 356, 519, 546, 547, :
number of items to replace is not a multiple of replacement length
3.4.1.1 validate edited value :: “shapeName”
wp_duplShapeName1 <- bdy_nga %>%
add_count(bdy_nga$shapeName) %>%
filter(n!=1) %>%
select(-n)
wp_duplShapeName1Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
[1] shapeName bdy_nga$shapeName geometry
<0 rows> (or 0-length row.names)
3.4.2 Perform Point-in-Polygon Overlay
This step combine both attribute and boundary of the water points into a simple feature object.
3.4.2.1 join Objects :: wp_sf and bdy_nga
Usage of the code chunk below :
st_join( ) - sf - to join sf-class objects based on geometry, namely, wp_sf and bdy_nga.
wp_joined <- st_join(wp_sf, bdy_nga)3.4.2.2 save and read RDS File :: wp_joined
write_rds(wp_joined,"/jephOstan/ISSS624/class_project/project_2/data/geodata/wp_joined.rds",compress = "xz")
wp_joined <- read_rds("/jephOstan/ISSS624/class_project/project_2/data/geodata/wp_joined.rds")3.4.2.3 inspect joined file :: wp_joined
st_geometry(wp_joined)Geometry set for 95008 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 2.707441 ymin: 4.301812 xmax: 14.21828 ymax: 13.86568
Geodetic CRS: WGS 84
First 5 geometries:
POINT (6.95009 6.78599)
POINT (7.604793 6.78321)
POINT (7.60024 6.759284)
POINT (7.615451 6.799595)
POINT (7.65991 6.762375)
-- determine reference point :: “shapeName” or “clean_adm2”
wp_refLga <- (wp_joined$shapeName == wp_joined$clean_adm2)
freq(wp_refLga) n % val%
FALSE 29713 31.3 31.3
TRUE 65266 68.7 68.7
NA 29 0.0 NA
Remarks :
There are 29,713 of “FALSE”, which is more than 30% of local government areas’ name mismatched between “shapeName” and “clean_adm2”.
Unlike the Water Point Data Exchange data that involved multitple parties, the geoBoundaries data is sourced from “geoBoundaries: A global database of political administrative boundaries.” Plos one 15, no. 4 (2020): e0231866,
- Hence, the “shapeName” to be used throughout this study.
-- assess uniqueness of each Water Point
wp_joined %>% janitor::get_dupes(shapeName,lat_lon_deg)No duplicate combinations found of: shapeName, lat_lon_deg
Simple feature collection with 0 features and 24 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
# A tibble: 0 × 25
# … with 25 variables: shapeName <chr>, lat_lon_deg <chr>, dupe_count <int>,
# row_id <dbl>, lat_deg <dbl>, lon_deg <dbl>, New Georeferenced Column <chr>,
# water_source <chr>, water_source_clean <chr>, water_source_category <chr>,
# water_tech_clean <chr>, water_tech_category <chr>, status_clean <chr>,
# status <chr>, clean_adm1 <chr>, clean_adm2 <chr>,
# water_point_population <dbl>, local_population_1km <dbl>,
# crucialness_score <dbl>, pressure_score <dbl>, usage_capacity <dbl>, …
Remarks :
Each water point observation is unique as there are no duplicate combination of “shapeName” together with “lat_lon_deg”.
-- reveal value :: “status_clean”
freq(wp_joined$status_clean) n % val%
Abandoned 175 0.2 0.2
Abandoned/Decommissioned 234 0.2 0.3
Functional 45883 48.3 54.4
Functional but needs repair 4579 4.8 5.4
Functional but not in use 1686 1.8 2.0
Non-Functional 29385 30.9 34.8
Non-Functional due to dry season 2403 2.5 2.8
Non functional due to dry season 7 0.0 0.0
NA 10656 11.2 NA
-- reveal value :: “crucialness_score”
summary(wp_joined$crucialness_score) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 0.130 0.304 0.414 0.628 1.000 6879
-- reveal value :: “is_urban”
freq(wp_joined$is_urban) n % val%
FALSE 75444 79.4 79.4
TRUE 19564 20.6 20.6
-- reveal value :: “water_tech_category”
freq(wp_joined$water_tech_category) n % val%
Hand Pump 58755 61.8 69.2
Mechanized Pump 25644 27.0 30.2
Rope and Bucket 1 0.0 0.0
Tapstand 553 0.6 0.7
NA 10055 10.6 NA
-- reveal value :: “usage_capacity”
freq(wp_joined$usage_capacity) n % val%
50 2 0.0 0.0
250 573 0.6 0.6
300 68789 72.4 72.4
1000 25644 27.0 27.0
3.4.3 Replace “NA” with “Unknown”
mutate( ) - dplyr - to run replace_na( ) function.
- replace_na( ) - tidyr - to replace NAs with “unknown”.
wp_joined1 <- wp_joined %>%
mutate(status_clean = replace_na(status_clean, "Unknown")) %>%
mutate(water_tech_category = replace_na(water_tech_category, "Unknown")) %>%
mutate(status = replace_na(status, "Unknown")) %>%
mutate(water_point_population = replace_na(water_point_population, 0)) %>%
mutate(local_population_1km = replace_na(local_population_1km, 0)) %>%
mutate(crucialness_score = replace_na(crucialness_score, 0)) %>%
mutate(pressure_score = replace_na(pressure_score, 0))3.4.4 Standardise Value
3.4.4.1 combine value :: “status_clean”
wp_joined1 <- wp_joined1 %>%
mutate(status_clean = str_replace(status_clean,"Non functional due to dry season" ,"Non-Functional due to dry season")) %>%
mutate(status_clean = str_replace(status_clean,"Abandoned/Decommissioned/Decommissioned","Abandoned/Decommissioned"))3.4.4.2 review “status_clean”
freq(wp_joined1$status_clean)3.4.4.3 read RDS file :: wp_joined1
wp_joined1 <- read_rds("/jephOstan/ISSS624/class_project/project_2/data/geodata/wp_joined1.rds")3.4.5 Extract Water Point for New Table :: wp_nga
3.4.5.1 extract functional water point
wpt_functional <- wp_joined1 %>%
filter(status_clean %in%
c("Functional",
"Functional but not in use",
"Functional but needs repair"))-- save and read RDS file :: wpt_functional
write_rds(wpt_functional,"/jephOstan/ISSS624/class_project/project_2/data/geodata/wpt_functional.rds",compress = "xz")
wpt_functional <- read_rds("/jephOstan/ISSS624/class_project/project_2/data/geodata/wpt_functional.rds")3.4.5.2 inspect variable and value
-- reveal value :: “status_clean”
freq(wpt_functional$status_clean) n % val%
Functional 45883 88.0 88.0
Functional but needs repair 4579 8.8 8.8
Functional but not in use 1686 3.2 3.2
length(wpt_functional$row_id)[1] 52148
length(wpt_functional$row_id)/length(wp_joined1$row_id)*100[1] 54.88801
Remarks :
The total functional water points is 52,148 which is about 54.89% of total water points.
-- reveal value :: “usage_capacity”
freq(wpt_functional$usage_capacity) n % val%
50 2 0.0 0.0
250 75 0.1 0.1
300 38064 73.0 73.0
1000 14007 26.9 26.9
-- reveal value “usage_capacity” by “status_clean”
wpt_functional %>% count(status_clean, usage_capacity, sort = TRUE)Simple feature collection with 10 features and 3 fields
Geometry type: MULTIPOINT
Dimension: XY
Bounding box: xmin: 2.711632 ymin: 4.302938 xmax: 13.5022 ymax: 13.86331
Geodetic CRS: WGS 84
# A tibble: 10 × 4
status_clean usage_capacity n geometry
* <chr> <dbl> <int> <MULTIPOINT [°]>
1 Functional 300 33687 ((3.064921 7.994882), (3.06…
2 Functional 1000 12124 ((3.080189 7.99252), (3.085…
3 Functional but needs repair 300 3306 ((3.340832 8.037962), (3.34…
4 Functional but needs repair 1000 1271 ((3.373801 7.992051), (3.33…
5 Functional but not in use 300 1071 ((3.046639 8.017765), (2.88…
6 Functional but not in use 1000 612 ((3.088655 8.005296), (3.05…
7 Functional 250 70 ((3.355785 6.498105), (3.67…
8 Functional but not in use 250 3 ((8.032945 6.878883), (7.00…
9 Functional 50 2 ((7.027967 4.765731), (8.92…
10 Functional but needs repair 250 2 ((6.465915 5.826699), (7.93…
-- reveal value :: “crucialness_score”
summary(wpt_functional$crucialness_score == 1) Mode FALSE TRUE
logical 47006 5142
-- determine the total population within 1 km by “crucialness_score”
freq(wpt_functional$crucialness_score == 1) n % val%
FALSE 47006 90.1 90.1
TRUE 5142 9.9 9.9
sum(wpt_functional[wpt_functional$crucialness_score == 1,]$local_population_1km)[1] 11252574
Remarks :
Given the “crucialness_score” is a ratio of current water point users to the total population within 1 km radius thereof :
Currently, 5,142 water points serve the population within a 1 km radius at its capacity limit.
The usage capacity may need to be increased to sustain or improve the growth or development rate within 1km of these water points.
Should the population within 1 km therefrom grow above 11,252,574, there may be multiple repercussions in resources management, urbanisation progress, local food and beverage consumption, local commodity prices, or worst case scenario would be the stability of civil society.
summary(wpt_functional$pressure_score > 1) Mode FALSE TRUE
logical 27469 24679
length(wpt_functional$pressure_score)[1] 52148
24679/52148*100 #percentage of functional waterpoints over their usage limit[1] 47.32492
Remarks :
Given the “pressure_score” is the ratio of the current water point users to the usage capacity thereof :
- 24,679, or 47.32% of functional water points, are currently over their limit of usage capacity.
3.4.5.3 Exploratory Data Analysis (EDA) :: wpt_functional
-- plot “status_clean”
ggplot(data = wpt_functional,
aes(fct_infreq(status_clean), fill=status_clean))+
geom_bar()+
geom_text(
aes(label=after_stat(count)),
stat='count',
nudge_x=-0.25,
vjust=-0.2)+
geom_text(
aes(label= scales::percent(signif(after_stat(count/sum(count))))),
stat='count',
nudge_x=0.25,
vjust=-0.2)+
scale_x_discrete(guide = guide_axis(n.dodge = 2))+
guides(fill=guide_legend (title = 'Status'))
-- plot “water_tech_category”
ggplot(data=wpt_functional,
aes(x=fct_infreq(
water_tech_category)))+
geom_bar(aes(
fill = water_tech_category),
width = 0.8)+
geom_text(aes(
label = ..count..),
stat = "count",
vjust=-0.2,
size = 3.5,
color = "black")+
scale_x_discrete(guide = guide_axis(n.dodge = 2))+
guides(fill=guide_legend (title = 'Water Tech Deployed'))Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(count)` instead.

-- plot “water_source_clean”
ggplot(data=wpt_functional,
aes(x=fct_infreq(
water_source_clean)))+
geom_bar(aes(
fill = water_source_clean),
width = 0.8)+
geom_text(aes(
label = ..count..),
stat = "count",
vjust=-0.2,
size = 3.5,
color = "black")+
scale_x_discrete(guide = guide_axis(
n.dodge = 2))+
guides(fill=guide_legend (
title = 'Source of Water Supply'))
3.4.5.4 add attribute to new data table
wp_nga <- bdy_nga %>%
mutate(`total_wp` = lengths(
st_intersects(bdy_nga, wp_joined1))) %>%
mutate(`wp_functional` = lengths(
st_intersects(bdy_nga, wpt_functional))) %>%
mutate(`pct_functional` = (`wp_functional`/`total_wp`*100))-- replace “NaN” with 0
wp_nga <- wp_nga %>%
mutate(`pct_functional` = replace_na(pct_functional, 0))
summary(wp_nga) shapeName geometry total_wp wp_functional
Length:774 MULTIPOLYGON :774 Min. : 0.0 Min. : 0.00
Class :character epsg:4326 : 0 1st Qu.: 45.0 1st Qu.: 17.00
Mode :character +proj=long...: 0 Median : 96.0 Median : 45.50
Mean :122.7 Mean : 67.36
3rd Qu.:168.8 3rd Qu.: 87.75
Max. :894.0 Max. :752.00
pct_functional
Min. : 0.00
1st Qu.: 32.61
Median : 47.41
Mean : 49.84
3rd Qu.: 66.99
Max. :100.00
3.4.5.5 extract non-functional water point
wpt_nonFunctional <- wp_joined1 %>%
filter(status_clean %in%
c("Abandoned/Decommissioned",
"Non-Functional",
"Non-Functional due to dry season"))-- save and read RDS file :: wpt_nonFuntional
write_rds(wpt_nonFunctional,"/jephOstan/ISSS624/class_project/project_2/data/geodata/wpt_nonFunctional.rds",compress = "xz")
wpt_nonFunctional <- read_rds("/jephOstan/ISSS624/class_project/project_2/data/geodata/wpt_nonFunctional.rds")3.4.5.6 inspect variable and value
-- reveal value :: “status_clean”
freq(wpt_nonFunctional$status_clean) n % val%
Abandoned/Decommissioned 234 0.7 0.7
Non-Functional 29385 91.7 91.7
Non-Functional due to dry season 2410 7.5 7.5
length(wpt_nonFunctional$row_id)[1] 32029
length(wpt_nonFunctional$row_id)/length(wp_joined1$row_id)*100[1] 33.7119
Remarks :
There are 32,204, which is about 33.9% out of total water points.
-- reveal value :: “usage_capacity”
freq(wpt_nonFunctional$usage_capacity) n % val%
250 41 0.1 0.1
300 20586 64.3 64.3
1000 11402 35.6 35.6
-- reveal value “usage_capacity” by “status_clean”
wpt_nonFunctional %>% count(status_clean, usage_capacity, sort = TRUE)Simple feature collection with 7 features and 3 fields
Geometry type: MULTIPOINT
Dimension: XY
Bounding box: xmin: 2.707441 ymin: 4.301812 xmax: 13.4192 ymax: 13.86567
Geodetic CRS: WGS 84
# A tibble: 7 × 4
status_clean usage_capac…¹ n geometry
* <chr> <dbl> <int> <MULTIPOINT [°]>
1 Non-Functional 300 18492 ((3.064526 7.994448), (3…
2 Non-Functional 1000 10852 ((3.083391 7.993231), (3…
3 Non-Functional due to dry season 300 2012 ((3.051752 7.984243), (3…
4 Non-Functional due to dry season 1000 398 ((3.056661 7.985808), (3…
5 Abandoned/Decommissioned 1000 152 ((4.713438 7.891137), (4…
6 Abandoned/Decommissioned 300 82 ((3.199483 8.912549), (2…
7 Non-Functional 250 41 ((3.976195 6.582998), (3…
# … with abbreviated variable name ¹usage_capacity
-- reveal “crucialness_score”
sum(wpt_nonFunctional$local_population_1km)[1] 93999535
sum(wpt_nonFunctional$water_point_population)[1] 46255888
Remarks :
Given the “crucialness_score” is a ratio of current water point users to the total population within a 1 km radius thereof , in the context of non-functional :
- Currently, out of 95,013,340 residents within a 1 km radius, there are 46,710,127 of them is affected by these non-functional water point.
3.4.5.7 EDA :: wpt_nonFunctional
-- plot “status_clean”
ggplot(data = wpt_nonFunctional,
aes(fct_infreq(status_clean),
fill=status_clean))+
geom_bar()+
geom_text(
aes(label=after_stat(count)),
stat='count',
nudge_x=-0.25,
vjust=-0.2)+
geom_text(
aes(label= scales::percent(
signif(
after_stat(
count/sum(count)
)))),
stat='count',
nudge_x=0.25,
vjust=-0.2)+
scale_x_discrete(
guide = guide_axis(
n.dodge = 2))+
guides(fill=guide_legend (
title = 'Status'))
-- plot “water_tech_category”
ggplot(data=wpt_nonFunctional,
aes(fct_infreq(
water_tech_category)))+
geom_bar(aes(
fill = water_tech_category),
width = 0.8)+
geom_text(aes(
label = ..count..),
stat = "count",
vjust=-0.2,
size = 3.5,
color = "black")+
scale_x_discrete(guide = guide_axis(
n.dodge = 2))+
guides(fill=guide_legend (
title = 'Water Tech Deployed'))
-- plot “water_source_clean”
ggplot(data=wpt_nonFunctional,
aes(fct_infreq(
water_source_clean)))+
geom_bar(aes(
fill = water_source_clean),
width = 0.8)+
geom_text(aes(
label = ..count..),
stat = "count",
vjust=-0.2,
size = 3.5,
color = "black")+
scale_x_discrete(guide = guide_axis(
n.dodge = 2))+
guides(fill=guide_legend (
title = 'Source of Water Supply'))
3.4.5.8 add wpt_nonFunctional to wp_nga
wp_nga <- wp_nga %>%
mutate(`wp_nonFunctional` = lengths(
st_intersects(bdy_nga, wpt_nonFunctional))) %>%
mutate(`pct_nonFunctional` = (`wp_nonFunctional`/`total_wp`*100))-- replace “NaN” with 0
wp_nga <- wp_nga %>%
mutate(`pct_nonFunctional` = replace_na(pct_nonFunctional, 0))
summary(wp_nga) shapeName geometry total_wp wp_functional
Length:774 MULTIPOLYGON :774 Min. : 0.0 Min. : 0.00
Class :character epsg:4326 : 0 1st Qu.: 45.0 1st Qu.: 17.00
Mode :character +proj=long...: 0 Median : 96.0 Median : 45.50
Mean :122.7 Mean : 67.36
3rd Qu.:168.8 3rd Qu.: 87.75
Max. :894.0 Max. :752.00
pct_functional wp_nonFunctional pct_nonFunctional
Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 32.61 1st Qu.: 12.00 1st Qu.: 20.77
Median : 47.41 Median : 33.50 Median : 34.89
Mean : 49.84 Mean : 41.37 Mean : 35.58
3rd Qu.: 66.99 3rd Qu.: 60.00 3rd Qu.: 50.00
Max. :100.00 Max. :278.00 Max. :100.00
3.4.5.9 extract unknown water point
wpt_unknown <- wp_joined1 %>%
filter(status_clean == "Unknown")-- save and read RDS file :: wpt_unknown
write_rds(wpt_unknown,"/jephOstan/ISSS624/class_project/project_2/data/geodata/wpt_unknown.rds")
wpt_unknown <- read_rds("/jephOstan/ISSS624/class_project/project_2/data/geodata/wpt_unknown.rds")3.4.5.10 inspect variable and value
-- reveal value :: “status_clean”
length(wpt_unknown$row_id)[1] 10656
length(wpt_unknown$row_id)/length(wp_joined1$row_id)*100[1] 11.2159
Remarks :
There are 10,656 water points with unknown status, about 11.22% of total water points.
-- determine affected population
sum(wpt_unknown$water_point_population)[1] 18831488
sum(wpt_unknown$local_population_1km)[1] 31418651
3.4.5.11 add wpt_unknown to wp_nga
wp_nga <- wp_nga %>%
mutate(`wp_unknown` = lengths(
st_intersects(bdy_nga, wpt_unknown))) %>%
mutate(`pct_unknown` = (`wp_unknown`/`total_wp`*100))-- replace “NaN” with 0
wp_nga <- wp_nga %>%
mutate(`pct_unknown` = replace_na(pct_unknown, 0))
summary(wp_nga) shapeName geometry total_wp wp_functional
Length:774 MULTIPOLYGON :774 Min. : 0.0 Min. : 0.00
Class :character epsg:4326 : 0 1st Qu.: 45.0 1st Qu.: 17.00
Mode :character +proj=long...: 0 Median : 96.0 Median : 45.50
Mean :122.7 Mean : 67.36
3rd Qu.:168.8 3rd Qu.: 87.75
Max. :894.0 Max. :752.00
pct_functional wp_nonFunctional pct_nonFunctional wp_unknown
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 32.61 1st Qu.: 12.00 1st Qu.: 20.77 1st Qu.: 0.00
Median : 47.41 Median : 33.50 Median : 34.89 Median : 0.00
Mean : 49.84 Mean : 41.37 Mean : 35.58 Mean : 13.76
3rd Qu.: 66.99 3rd Qu.: 60.00 3rd Qu.: 50.00 3rd Qu.: 17.75
Max. :100.00 Max. :278.00 Max. :100.00 Max. :219.00
pct_unknown
Min. : 0.00
1st Qu.: 0.00
Median : 0.00
Mean : 12.55
3rd Qu.: 20.83
Max. :100.00
3.4.5.12 visualise distribution :: “status_clean”
Usage of the code chunk below :
qtm( ) - tmap - to plot a thematic map quickly.
tmap_arrange( ) - tmap - to arrange small multiples in grid layout.
total_wp <- qtm(wp_nga,"total_wp")+
tm_layout(legend.height = 0.3, legend.width = 0.5)
wp_functional <- qtm(wp_nga,"wp_functional")+
tm_layout(legend.height = 0.3, legend.width = 0.5)
wp_nonFunctional <- qtm(wp_nga,"wp_nonFunctional")+
tm_layout(legend.height = 0.3, legend.width = 0.5)
wp_unknown <- qtm(wp_nga,"wp_unknown")+
tm_layout(legend.height = 0.3, legend.width = 0.5)
tmap_arrange(total_wp, wp_functional, wp_nonFunctional, wp_unknown, asp=0, ncol = 2, nrow = 2, widths = 5, heights = 10, sync = TRUE)
3.4.5.13 extract “water_tech_category” to wp_nga
freq(wp_joined1$water_tech_category, sort = "dec") n % val%
Hand Pump 58755 61.8 61.8
Mechanized Pump 25644 27.0 27.0
Unknown 10055 10.6 10.6
Tapstand 553 0.6 0.6
Rope and Bucket 1 0.0 0.0
Remarks :
Only “Hand Pump”, “Mechanized Pump”, and “Tapstand” are to be extracted for further analysis as the rest are either less than 0.5% or “Unknown”.
wtc_hPump <- wp_joined1 %>%
filter(water_tech_category %in%
"Hand Pump")
wtc_mPump <- wp_joined1 %>%
filter(water_tech_category %in%
"Mechanized Pump")
wtc_tStand <- wp_joined1 %>%
filter(water_tech_category %in%
"Tapstand")
wp_nga <- wp_nga %>%
mutate(`total_handPump` = lengths(
st_intersects(bdy_nga, wtc_hPump)
)) %>%
mutate(`total_mechPump` = lengths(
st_intersects(bdy_nga, wtc_mPump)
)) %>%
mutate(`total_tapStand` = lengths(
st_intersects(bdy_nga, wtc_tStand)
)) %>%
mutate(`pct_handPump` = (`total_handPump`/`total_wp`*100)) %>%
mutate(`pct_mechPump` = (`total_mechPump`/`total_wp`*100)) %>%
mutate(`pct_tapStand` = (`total_tapStand`/`total_wp`*100))-- replace “NaN” with 0
wp_nga <- wp_nga %>%
mutate(`pct_handPump` = replace_na(pct_handPump, 0)) %>%
mutate(`pct_mechPump` = replace_na(pct_mechPump, 0)) %>%
mutate(`pct_tapStand` = replace_na(pct_tapStand, 0))
summary(wp_nga) shapeName geometry total_wp wp_functional
Length:774 MULTIPOLYGON :774 Min. : 0.0 Min. : 0.00
Class :character epsg:4326 : 0 1st Qu.: 45.0 1st Qu.: 17.00
Mode :character +proj=long...: 0 Median : 96.0 Median : 45.50
Mean :122.7 Mean : 67.36
3rd Qu.:168.8 3rd Qu.: 87.75
Max. :894.0 Max. :752.00
pct_functional wp_nonFunctional pct_nonFunctional wp_unknown
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 32.61 1st Qu.: 12.00 1st Qu.: 20.77 1st Qu.: 0.00
Median : 47.41 Median : 33.50 Median : 34.89 Median : 0.00
Mean : 49.84 Mean : 41.37 Mean : 35.58 Mean : 13.76
3rd Qu.: 66.99 3rd Qu.: 60.00 3rd Qu.: 50.00 3rd Qu.: 17.75
Max. :100.00 Max. :278.00 Max. :100.00 Max. :219.00
pct_unknown total_handPump total_mechPump total_tapStand
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0000
1st Qu.: 0.00 1st Qu.: 6.00 1st Qu.: 11.00 1st Qu.: 0.0000
Median : 0.00 Median : 47.00 Median : 25.50 Median : 0.0000
Mean : 12.55 Mean : 75.89 Mean : 33.12 Mean : 0.7145
3rd Qu.: 20.83 3rd Qu.:111.00 3rd Qu.: 46.00 3rd Qu.: 0.0000
Max. :100.00 Max. :764.00 Max. :245.00 Max. :42.0000
pct_handPump pct_mechPump pct_tapStand
Min. : 0.00 Min. : 0.00 Min. : 0.0000
1st Qu.: 16.70 1st Qu.: 12.20 1st Qu.: 0.0000
Median : 50.99 Median : 31.27 Median : 0.0000
Mean : 48.73 Mean : 37.54 Mean : 0.5794
3rd Qu.: 77.78 3rd Qu.: 57.71 3rd Qu.: 0.0000
Max. :100.00 Max. :100.00 Max. :32.8947
3.4.5.14 visualise wp_nga distribution :: “water_tech_category”
tmap_mode("view")tmap mode set to interactive viewing
handPump <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_handPump",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
mechPump <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_mechPump",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
tapStand <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_tapStand",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
tmap_arrange(handPump, mechPump, tapStand,
asp=1,
ncol=2,
sync = TRUE)tmap_mode("plot")tmap mode set to plotting
3.4.5.15 extract “usage_capacity” to wp_nga
freq(wp_joined1$usage_capacity, sort = "dec") n % val%
300 68789 72.4 72.4
1000 25644 27.0 27.0
250 573 0.6 0.6
50 2 0.0 0.0
Remarks :
Only “300”, “1000”, and “250” are to be extracted for further analysis as the rest are either less than 0.5% or “Unknown”.
But, “50” will be included in the new variable “total_ucN1000” as part of the none ‘1000’ “usage_capacity” value.
uCap_300 <- wp_joined1 %>%
filter(usage_capacity %in%
"300")
uCap_1000 <- wp_joined1 %>%
filter(usage_capacity %in%
"1000")
uCap_250 <- wp_joined1 %>%
filter(usage_capacity %in%
"250")
uCap_50 <- wp_joined1 %>%
filter(usage_capacity %in%
"50")
wp_nga <- wp_nga %>%
mutate(`total_uc300` = lengths(
st_intersects(bdy_nga, uCap_300)
)) %>%
mutate(`total_uc1000` = lengths(
st_intersects(bdy_nga, uCap_1000)
)) %>%
mutate(`total_uc250` = lengths(
st_intersects(bdy_nga, uCap_250)
)) %>%
mutate(`total_uc50` = lengths(
st_intersects(bdy_nga, uCap_50)
)) %>%
mutate(`total_ucN1000` = ((lengths(
st_intersects(
bdy_nga, uCap_300))) + (lengths(
st_intersects(
bdy_nga, uCap_250))) + (lengths(
st_intersects(
bdy_nga, uCap_50))))
)%>%
mutate(`pct_ucN1000` = (`total_ucN1000`/`total_wp`*100)) %>%
mutate(`pct_uc300` = (`total_uc300`/`total_wp`*100)) %>%
mutate(`pct_uc1000` = (`total_uc1000`/`total_wp`*100)) %>%
mutate(`pct_uc250` = (`total_uc250`/`total_wp`*100))-- replace “NaN” with 0
wp_nga <- wp_nga %>%
mutate(`pct_ucN1000` = replace_na(pct_ucN1000, 0)) %>%
mutate(`pct_uc300` = replace_na(pct_uc300, 0)) %>%
mutate(`pct_uc1000` = replace_na(pct_uc1000, 0)) %>%
mutate(`pct_uc250` = replace_na(pct_uc250, 0))
summary(wp_nga) shapeName geometry total_wp wp_functional
Length:774 MULTIPOLYGON :774 Min. : 0.0 Min. : 0.00
Class :character epsg:4326 : 0 1st Qu.: 45.0 1st Qu.: 17.00
Mode :character +proj=long...: 0 Median : 96.0 Median : 45.50
Mean :122.7 Mean : 67.36
3rd Qu.:168.8 3rd Qu.: 87.75
Max. :894.0 Max. :752.00
pct_functional wp_nonFunctional pct_nonFunctional wp_unknown
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
1st Qu.: 32.61 1st Qu.: 12.00 1st Qu.: 20.77 1st Qu.: 0.00
Median : 47.41 Median : 33.50 Median : 34.89 Median : 0.00
Mean : 49.84 Mean : 41.37 Mean : 35.58 Mean : 13.76
3rd Qu.: 66.99 3rd Qu.: 60.00 3rd Qu.: 50.00 3rd Qu.: 17.75
Max. :100.00 Max. :278.00 Max. :100.00 Max. :219.00
pct_unknown total_handPump total_mechPump total_tapStand
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0000
1st Qu.: 0.00 1st Qu.: 6.00 1st Qu.: 11.00 1st Qu.: 0.0000
Median : 0.00 Median : 47.00 Median : 25.50 Median : 0.0000
Mean : 12.55 Mean : 75.89 Mean : 33.12 Mean : 0.7145
3rd Qu.: 20.83 3rd Qu.:111.00 3rd Qu.: 46.00 3rd Qu.: 0.0000
Max. :100.00 Max. :764.00 Max. :245.00 Max. :42.0000
pct_handPump pct_mechPump pct_tapStand total_uc300
Min. : 0.00 Min. : 0.00 Min. : 0.0000 Min. : 0.00
1st Qu.: 16.70 1st Qu.: 12.20 1st Qu.: 0.0000 1st Qu.: 15.25
Median : 50.99 Median : 31.27 Median : 0.0000 Median : 59.00
Mean : 48.73 Mean : 37.54 Mean : 0.5794 Mean : 88.85
3rd Qu.: 77.78 3rd Qu.: 57.71 3rd Qu.: 0.0000 3rd Qu.:126.75
Max. :100.00 Max. :100.00 Max. :32.8947 Max. :767.00
total_uc1000 total_uc250 total_uc50 total_ucN1000
Min. : 0.00 Min. : 0.0000 Min. :0.000000 Min. : 0.00
1st Qu.: 11.00 1st Qu.: 0.0000 1st Qu.:0.000000 1st Qu.: 16.00
Median : 25.50 Median : 0.0000 Median :0.000000 Median : 60.00
Mean : 33.12 Mean : 0.7403 Mean :0.002584 Mean : 89.59
3rd Qu.: 46.00 3rd Qu.: 0.0000 3rd Qu.:0.000000 3rd Qu.:127.75
Max. :245.00 Max. :42.0000 Max. :1.000000 Max. :767.00
pct_ucN1000 pct_uc300 pct_uc1000 pct_uc250
Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0000
1st Qu.: 39.68 1st Qu.: 38.67 1st Qu.: 12.20 1st Qu.: 0.0000
Median : 67.03 Median : 65.91 Median : 31.27 Median : 0.0000
Mean : 60.78 Mean : 60.17 Mean : 37.54 Mean : 0.6114
3rd Qu.: 87.35 3rd Qu.: 87.02 3rd Qu.: 57.71 3rd Qu.: 0.0000
Max. :100.00 Max. :100.00 Max. :100.00 Max. :32.8947
3.4.5.16 visualise wp_nga distribution :: “usage_capacity”
tmap_mode("view")tmap mode set to interactive viewing
uc300 <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_uc300",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
uc1000 <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_uc1000",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
uc250 <- tm_shape(bdy_nga)+
tm_polygons(alpha = 0.1) +
tm_shape(wp_nga) +
tm_dots(col = "pct_uc250",
border.col = "gray60",
border.lwd = 0.5) +
tm_view(set.zoom.limits = c(5,9))
tmap_arrange(uc300, uc1000, uc250,
asp=1,
ncol=2,
sync = TRUE)